{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2ff7adbd",
   "metadata": {},
   "source": [
    "\n",
    "# Biological Variables and Data Validation in Biostatistics\n",
    "\n",
    "This notebook explores concepts such as biological variables, confounders, and data validation in biostatistics. It includes practical examples and Python-based methods for validation and analysis.\n",
    "\n",
    "## Topics Covered\n",
    "- Static vs. Dynamic Biological Variables\n",
    "- Biological Experiments and Statistical Integration\n",
    "- Confounders and Latent Variables\n",
    "- Validation of Biological Data\n",
    "\n",
    "### Prerequisites\n",
    "Install the required Python packages:\n",
    "```bash\n",
    "pip install numpy pandas matplotlib seaborn\n",
    "```\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99a1d3fb",
   "metadata": {},
   "source": [
    "\n",
    "## Static vs. Dynamic Biological Variables\n",
    "\n",
    "- **Static Variables:** Do not change over time (e.g., blood type, genetic traits).\n",
    "- **Dynamic Variables:** Change over time or under different conditions (e.g., blood pressure, BMI).\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "800ad040",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import pandas as pd\n",
    "\n",
    "# Example of static and dynamic variables\n",
    "data = pd.DataFrame({\n",
    "    \"Subject\": [1, 2, 3, 4],\n",
    "    \"BloodType\": [\"A\", \"B\", \"AB\", \"O\"],  # Static variable\n",
    "    \"BloodPressure\": [120, 130, 125, 140],  # Dynamic variable\n",
    "    \"BMI\": [22.5, 24.8, 23.1, 27.0]  # Dynamic variable\n",
    "})\n",
    "\n",
    "print(\"Example Data:\")\n",
    "print(data)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac09406e",
   "metadata": {},
   "source": [
    "\n",
    "## Integrating Biological Experiments with Statistical Analysis\n",
    "\n",
    "Biological experiments provide data for statistical evaluation. For example, comparing ejection fractions across treatments can reveal the efficacy of a cardiology treatment.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a110af56",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Example: Ejection Fraction Calculation\n",
    "data = pd.DataFrame({\n",
    "    \"Group\": [\"Treatment\", \"Control\"],\n",
    "    \"SystolicVolume\": [50, 45],\n",
    "    \"DiastolicVolume\": [120, 115]\n",
    "})\n",
    "\n",
    "# Calculate Ejection Fraction (%)\n",
    "data[\"EjectionFraction\"] = (data[\"SystolicVolume\"] / data[\"DiastolicVolume\"]) * 100\n",
    "\n",
    "print(\"Ejection Fraction Data:\")\n",
    "print(data)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c1ddd39",
   "metadata": {},
   "source": [
    "\n",
    "## Identifying Confounders in Biological Data\n",
    "\n",
    "Confounders are variables that can bias the relationship between an independent variable and a dependent variable. Consider controlling for confounders in experiments.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4521481f",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Example: Confounder Analysis\n",
    "data = pd.DataFrame({\n",
    "    \"Subject\": [1, 2, 3, 4],\n",
    "    \"BMI\": [22, 25, 28, 30],\n",
    "    \"ActivityLevel\": [\"High\", \"Medium\", \"Low\", \"Low\"],\n",
    "    \"Outcome\": [1, 0, 1, 0]  # Binary outcome (e.g., disease presence)\n",
    "})\n",
    "\n",
    "# Check for potential confounders\n",
    "print(\"Correlation between BMI and Outcome:\")\n",
    "print(data.corr())\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "103acedb",
   "metadata": {},
   "source": [
    "\n",
    "## Validating Biological Data\n",
    "\n",
    "Validation involves ensuring the accuracy and reliability of data by:\n",
    "1. Checking for missing values.\n",
    "2. Detecting duplicates.\n",
    "3. Ensuring data consistency.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "529cf15e",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Data validation example\n",
    "data = pd.DataFrame({\n",
    "    \"Subject\": [1, 2, 2, 4],\n",
    "    \"BloodPressure\": [120, 130, None, 140],\n",
    "    \"BMI\": [22.5, 24.8, 24.8, None]\n",
    "})\n",
    "\n",
    "# Check for missing values\n",
    "print(\"Missing Values:\")\n",
    "print(data.isnull().sum())\n",
    "\n",
    "# Check for duplicates\n",
    "print(\"\n",
    "Duplicate Rows:\")\n",
    "print(data[data.duplicated()])\n",
    "\n",
    "# Fill missing values with median\n",
    "data[\"BloodPressure\"] = data[\"BloodPressure\"].fillna(data[\"BloodPressure\"].median())\n",
    "data[\"BMI\"] = data[\"BMI\"].fillna(data[\"BMI\"].median())\n",
    "\n",
    "print(\"\n",
    "Cleaned Data:\")\n",
    "print(data)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99f6d6a1",
   "metadata": {},
   "source": [
    "\n",
    "## Visualizing Biological Data\n",
    "\n",
    "Use histograms and scatter plots to explore the distribution and relationships of biological variables.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b14aefcb",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# Histogram for BMI\n",
    "sns.histplot(data[\"BMI\"], kde=True)\n",
    "plt.title(\"BMI Distribution\")\n",
    "plt.xlabel(\"BMI\")\n",
    "plt.ylabel(\"Frequency\")\n",
    "plt.show()\n",
    "\n",
    "# Scatter plot for Blood Pressure vs. BMI\n",
    "sns.scatterplot(x=data[\"BMI\"], y=data[\"BloodPressure\"])\n",
    "plt.title(\"Blood Pressure vs. BMI\")\n",
    "plt.xlabel(\"BMI\")\n",
    "plt.ylabel(\"Blood Pressure\")\n",
    "plt.show()\n",
    "        "
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}
